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In vaccine studies for infectious diseases such as human immun- 
odeficiency virus (HIV), the frequency and type of contacts between 
study participants and infectious sources are among the most in- 
formative risk factors, but are often not adequately adjusted for in 
standard analyses. Such adjustment can improve the assessment of 
vaccine efficacy as well as the assessment of risk factors. It can be at- 
tained by modeling transmission per contact with infectious sources. 
However, information about contacts that rely on self-reporting by 
study participants are subject to nontrivial measurement error in 
many studies. We develop a Bayesian hierarchical model fitted us- 
ing Markov chain Monte Carlo (MCMC) sampling to estimate the 
vaccine efficacy controlled for exposure to infection, while adjusting 
for measurement error in contact-related factors. Our method is used 
to re-analyze two recent HIV vaccine studies, and the results are 
compared with the published primary analyses that used standard 
methods. The proposed method could also be used for other vaccines 
where contact information is collected, such as human papilloma virus 
vaccines. 

1. Introduction. Two randomized multi-center Phase III preventive HIV 
vaccine trials were conducted to evaluate the efficacy of two versions of 
AIDSVAX, a recombinant glycoprotein 120 (rgpl20) vaccine developed by 
VaxGen and designed to provide protective immunity by inducing antibody 
response. One trial (VAX004) was conducted in adults at risk of sexual trans- 
mission in North America and the Netherlands, launched in June, 1998, and 
the other (VAX003) in injecting drug users (IDUs) in Bangkok, Thailand, 
started in March, 1999. In analyses using Cox proportional hazards models, 
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the vaccine has been shown to be noneffective in Gurwith et al. (2005) for 
VAX004 and in Pitisuttithum et al. (2006) for VAX003. 

A general definition of vaccine efficacy is VE = 1 — RR, where RR is 
the relative risk of infection for a vaccinated subject compared to that 
for a control subject. Depending on how risk is defined, various VE mea- 
sures can be derived. The most frequently used measures were classified by 
Halloran, Struchiner and Longini (1997) into two categories: conditional on 
exposure to infection and unconditional, that is, whether the measure is 
controlling for the frequency and type of contacts that lead to transmission. 
A contact can be defined as one sexual act of a certain type in the context 
of VAX004 and as one act of sharing a needle for drug injection in VAX003. 
The VE measure used in Gurwith et al. (2005) and in Pitisuttithum et al. 
(2006) falls in the unconditional category. It is of public health interest to re- 
analyze the two vaccine trials using a VE measure conditional on exposure 
to infection. 

For proper inference conditional on exposure to infection, measurement er- 
ror in exposure factors should be taken into account. For example, the num- 
bers of needle-sharing acts are often under-reported when IDUs are inter- 
viewed [Hudgens et al. (2002)]. Thus, methods depending solely on reported 
exposure information could be inappropriate. To handle the problem of mea- 
surement error, many methods have been introduced [Carroll, Ruppert and 
Stefanski (1995)]. In the nonparametric setting, Fan and Truong (1993) ex- 
plored the properties of globally consistent nonparametric regression using 
deconvolution kernels. Cook and Stefanski (1994) and Carroll et al. (1996) 
developed the simulation extrapolation method that imposes no assump- 
tion on the covariates measured with error and uses resampling to detect 
the trend of measurement error. Richardson and Green (1997) discussed the 
use of mixture priors for covariates measured with error in the Bayesian 
framework, and this method was extended to epidemiological studies with 
a validation set [Richardson et al. (2002)]. In these two vaccine trials, the 
exposure factors that are subject to measurement error and that are most 
vital to parameter estimation are the frequencies and the types of contacts. 

In this paper we develop a Bayesian framework under the simple assump- 
tion of conditional independence [Richardson and Gilks (1993)] for infec- 
tious disease incidence data with contact frequency and type recorded for 
each observation. Using this Bayesian model, we re-analyze the data from 
the two AIDSVAX trials. Our primary focus is to estimate the transmission 
probability and vaccine efficacy per infectious contact, while adjusting for 
measurement error in contact frequency and type. In addition, these studies 
provide information to address the following questions that are useful for 
understanding HIV transmission: 

• Is VE modified by the baseline behavioral risk profile? 
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Table 1 

Two randomized multi-center trials conducted for evaluating the efficacy of AIDSVAX, 
a recombinant glycoprotein 120 HIV-1 vaccine 



VAX004 



VAX003 



Time of trial 
Location 

Type of transmission 
Population size 
Male 
Female 
Randomization ratio 

(vaccine:placebo) 
Infected/Randomized 
Placebo 

Male 
Female 
Vaccine 

Male 
Female 
HIV-1 subtypes 
B 
E 

Untypeable 



1998-2002 
North America and The Netherlands 
Sexual acts 

5403 
5095 (94%) 
308 (6%) 

2:1 

127/1805 
123/1704 

4/101 
241/3598 
239/3391 
2/207 

100% 





1999-2003 
Bangkok, Thailand 
Sharing needles for drug injection 
2527 
2361 (93%) 
166 (7%) 

1:1 

105/1260 
101/1170 

4/90 
106/1267 
100/1191 
6/76 

33 (78%) 
164 (16%) 
14 (6%) 



• Is the use of condoms in sexual contacts protective? 

• Is sharing needles more risky in prison compared to in the general public? 

• Is one subtype of HIV more infectious than another subtype via shared 
needle injection? 

The results are compared to those obtained in Gurwith et al. (2005), 
Pitisuttithum et al. (2006) and Hudgens et al. (2002). 

2. Data description. Basic characteristics of the two trials are presented 
in Table 1. The two trials had similar designs except the ratio of vaccine 
to placebo recipients. Each subject was enrolled free of HIV infection and 
received seven injections (study vaccine or placebo) at months 0, 1, 6 and 
every six months thereafter up to month 30. At each immunization visit 
and the final visit at month 36, antibody assays of blood samples were 
performed, and exposure factors, adverse events and social harm events for 
each participant in the past six months were collected. The primary endpoint 
of the trials was the detection of HIV-1 infection that is defined as both a 
positive HIV-1 enzyme immunoassay antibody test and the development of 
at least two new nonvaccine bands on confirmatory HIV immunoblot. 

For trial VAX004, in addition to vaccine status, exposure factors were col- 
lected in the form of sexual contact frequencies categorized by the behavioral 
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type of the contact (vaginal, oral or anal), gender of the partner, the infec- 
tion status of partners reported by the subject (HIV-positive, HIV-negative 
or unknown), and condom use. To reduce the dimension of parameters, we 
ignore the effects of behavioral type and gender on transmission probabili- 
ties by summing the frequencies over the corresponding categories. As the 
study participants were mostly men that have sex with men (MSM), with 
females accounting for only 6% of the population and 1.6% of the infections, 
we are largely assessing transmission via MSM contacts. 

For trial VAX003, the exposure factors of interest are the frequency of 
injections, the fraction of injections using needles shared with other people, 
the history of injection in jail or prison (incarceration injection), and the 
vaccine status. Since one of two HIV-1 subtypes (E and B) was found for 
most infections, it is possible to estimate the transmission probability and 
vaccine efficacy for each of the two subtypes, given that reasonable estimates 
of the prevalences of these subtypes among the IDUs in Bangkok, Thailand, 
are available. Contact information collected in this study is not as detailed 
as in VAX004. Both the injection frequency and the fraction using shared 
needles were reported as a few categories instead of numbers. There are 
four categories for the injection frequency (none, < 1/week, > 1/week but < 
1/day, and > 1/day), to which we assign values 10 -10 /day, 0.5/week, 4/week 
and 1/day respectively. There are five categories for the fraction of injections 
using shared needles (none, occasionally, half of the time, most and always), 
to which we assign values 0.5%, 15%, 50%, 85% and 99.5% respectively. 

3. Methods. 

3.1. Model structure. Following Richardson and Gilks (1993), we specify 
three submodels for our Bayesian analysis of the measurement error problem: 
the regression submodel, the measurement error submodel and the prior 
submodel. In the type of study we are considering, risk factors and infection 
status are obtained for each subject over consecutive six-month intervals. 
Let N be the total number of study participants and T{ be the number of 
intervals of subject i, i = 1, . . . , N. We use data collected from month 6 to 
month 36, excluding month as an adjustment for left truncation. Visits 
after the first with positive HIV detection are also excluded from analysis. 
For notational convenience, we identify the tth interval of subject i by (i,t). 

3.1.1. The regression submodel. Let po be the baseline transmission prob- 
ability per infectious contact. An infectious contact refers to a contact 
with an infectious source. Let nu be the number of contacts and xuj = 
(xuji, ■ • ■ , xajxY be the vector of K covariates associated with the jth con- 
tact in interval (i,t), j = 1, . . . , nu- The covariates associated with a contact 
may include characteristics of the subject (e.g., vaccine status), the partner 
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(e.g., infection status) and the contact itself (e.g., condom use, incarceration, 
etc.). To associate the transmission probability with covariates, we consider 
a logit model: 

(1) P(x it j) = logit -1 (logit (p ) + xj tj 0), 

where 9 = (6\, . . . , OkY is the coefficient vector with the interpretation that 
exp(9k) is the increment in odds of transmission per unit increase in x^jk or 
the odds ratio (OR) for xujk = 1 relative to xujk = if %itjk is binary. Other 
regression submodels such as the complementary log-log could also be used. 
Also frequently used is the multiplicative submodel p{xnj) = poexp{xJ t j9}. 
However, it is sometimes difficult to guarantee p(xuj) < 1 when po and 9 are 
simultaneously sampled. In the context of the two AIDS VAX trials, we use 
OR mc , OR con and ORj nc to denote the odds ratios of transmission per in- 
fectious contact for vaccination, condom use and incarceration, respectively. 
The probability of escaping infection in interval (i,t) is 

(2) Qit=Y[(.l-p{xitj)ir(x itj )), 

i=i 

where ir(xitj) is the prevalence of infectious contacts among all contacts with 
covariates Xuj. As p{xnj) and ir(xitj) always appear as a product, they are 
not estimable at the same time, and ir^xuj) is often assumed known and 
evaluated from either literature or the data. 

As mentioned in the introduction, different measures can be used for vac- 
cine efficacy, depending on the definition of relative risks. A natural choice 
is the VE per infectious contact with the risks being transmission probabili- 
ties per infectious contact as given in (1). However, the relative risk obtained 
from transmission probabilities per infectious contact depends on not only 
the vaccine status but also other covariates. Such dependency may not ex- 
ist in different models. For example, if we assume a multiplicative model 
p{xitj) = poexp(xJ t j9), the VE per infectious contact will depend solely on 
the vaccine status. For the logit model, the dependency could also be mini- 
mal if p(xitj) is small, where we have VE per infectious contact ~ 1 — OR^ ac . 
The approximation holds for the contact types we consider here, and thus, 
we report 1 — OR mc as the VE per infectious contact for the data analysis. 

Expressions (1) and (2) provide a general form for the regression sub- 
model. The exact form is specific to each study, depending on the covariates 
under consideration, and is described below. 

The North America and Netherlands trial (VAX004). For trial VAX004, 
we are interested in the effects of vaccine and condom usage. Let Vi indicate 
the vaccine status (1: yes, 0: no) and cuj indicate the condom use (1: yes, 
0: no) for the jth sexual contact in interval (i,t). Let po be the transmis- 
sion probability for a sexual contact without a condom between a placebo 
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recipient and an infected partner. We assume the prevalence, 7r, of HIV in 
contacts is identical for all intervals and is known. The escape probability 
for interval (i,t) is given by 

(3) Qu = -pKc(t» = (l-p(v h l)7r) m «(l -pM*)"**-™**, 

3=1 

where p(vi,c it j) = logit _1 (logit(p ) + O v Vi + dc c itj)i 9 V and 9 C are the effects 
of the vaccine and condom use, and mu = Y^=i c itj-> the total number of con- 
tacts with a condom. The probability distribution of the final transmission 
status, Da (1: infection, 0: escape), is then 

(4) Pr(y u \n i t,rn i t,v i -,p ,9 v ,e c ) = Q u 1 ~ m (l-Q u ) m . 

The Thai trial (VAX003). For this trial, we consider vaccine status, incar- 
ceration history of the subject and needle-sharing as covariates. Let po be 
the baseline probability of infection by an injection using a needle shared 
with an HIV-infected person. Let u\ denote whether the subject had incar- 
ceration injection (1: yes, 0: no) during the study, and Suj denote whether 
the injection was using a shared needle (0: yes, 1: no). Also define 9 V , 9 U 
and 9 S as the effects of the covariates, respectively. We assume that injec- 
tions using nonshared needles were not infectious. That is, 9 S = — oo, and 
the regression submodel is built solely on the mu = X)j=i(l ~~ s itj) contacts 
using shared needles. The probability of escaping infection in interval (i,t) 
is given by 

mt 

(5) Qit = J]_(l-p(vi,Ui,Sitj)n) = (l-p(vi,Ui,0)-K) mu , 

3=1 

where p(vi,Ui, s it j) =logit~ 1 (logit(po) + 9 v v i + 9 u u i + 9 s Sitj). The probability 
distribution of the final transmission status is the same as (4). 

As the HIV subtype was determined for most infected subjects, it is pos- 
sible to estimate the transmission probability and vaccine efficacy for each 
subtype. Let p^ (p^ ) be the baseline probability of infection by an injection 
using a needle shared with somebody infected with HIV of subtype E (B), 

9^ (9^) be the vaccine effects against transmission of subtype E (B), and 
7]-( e ) (7P b )) be the prevalence of people infected with subtype E (B) among 
the IDU population. The probabilities of escaping infection from injections 
using needles shared with infected partners of subtype E and subtype B, 
respectively, are given by 

Ql? = (1 - logit^logit^) + 9^ Vl + 9 uUi )^) mit 

and 

Q,? = (1 - logit-^logitrf)) + ei%t + 9 uUi )^) mit - 
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We assume transmission of subtype E is independent of transmission of sub- 
type B. As infection by both subtypes is rare, we assume an infected subject 
typed as E (B) must have escaped transmission from infectious contacts of 
subtype B (E). The probability distribution of the final transmission status 
can be expressed as 

Pr(yj t , subtype | 

(6) 

m = o, 

Q^(l - Q$), ytt = 1, subtype = E, 
' Qui 1 ~ Qa), Va = 1> subtype = B, 
> 1 - Qu ) Qit > > yu = l, subtype = U, 
where "U" stands for "Untypeable." 



3.1.2. The measurement error submodel. We consider two types of ex- 
posure information that are measured with error, the total number of con- 
tacts, Tin, and the number of a particular subset of contacts, m !( . Let hu 
and ma be the measured values of nu and mu, respectively. As data in 
the form of counts over time periods often arise from a Poisson process, 
we assume a Poisson distribution for the true number of contacts nu and 
an over-dispersed Poisson distribution for the measured number nu during 
a time interval of length la, given the contact rate A^. The reason for an 
over-dispersion structure is that we want some correction for the potentially 
under- or over-reported number of contacts, for example, the number of 
sexual contacts in a single interval was reported as thousands by several 
subjects in trial VAX004. The histograms of reported contact rates in Fig- 
ure 1(a) for VAX004 and Figure 1(c) for VAX003 suggested either gamma 
or log-normal distributions. We use the log- normal distribution for illustra- 
tion, but compare both in the data analyses. Define rii = (nu, . . . ,niT^) T , 
rrii = (ma, . ..,m iTl ) T , h it = (ha, . . .,h iTl ) T , rh { = (rhn, . . .,rh iTt ) T and A; = 
(Aji, ... , AiTj) r - Let 1 and J denote the vector and matrix, respectively, with 
all elements being 1, and let i" denote the identity matrix. The dimensions 
of 1, J and / are clear from the context and are thus suppressed. We choose 
the following measurement error structure for nu' 

Xi ~ Log-Normal(^l,cr 2 (pJ + (1 - p)I)), 
n it ~ Poisson(AitZif), 

(7) 

6u ~ Gamma(</>, \itkt/4>), 
h it ~ Poisson(5 it ). 
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An exchangeable within-subject correlation structure is assumed for the con- 
tact rates, Aj, but other correlation structures could be considered. The mag- 
nitude of correlation among elements of Aj is measured by p, < p < 1, the 
correlation coefficient for log(Aj). We assume unbiasness for the measure- 
ment error, as E(hit\Xit) = Xukt = E(n^|Ait). The over-dispersion is reflected 
by VAR(hit\Xit) = XuktO- + Xitkt/(p) and is generated by adding the layer 
of Si = (5n, . . . , 5iTj) T ■ The degree of over-dispersion decreases as 4> goes to 
infinity. By our assumption, riu is conditionally independent of fin given 
the contact rate Xu- Zero values of fin are allowed for intervals in which 
infections happened since only nn is required to be nonzero. 

Given riu and ha, it is natural to choose binomial distributions for both 
the true number ran and the measured number fan based on a beta-distributed 
proportion which is also suggested by the histograms of reported propor- 
tions of contacts with condom use in Figure 1(b) for VAX004 and contacts 
with needle-sharing in Figure 1(d) for VAX003. Define <&(•) as the standard 



Q1: 0.02 
Median: 0.07 
Q3: 0.19 



iini mini iih i- 1 1 1 ii : i i mil i mini 



(a) 



< 1/week 



1/week ~ 1/day 

>- 1/day 



(c) 



0.0 



— I — 

0.2 



Q1: 0.08 
Median: 0.6 
Q3: 1 .0 



— i — 

0.4 



0.6 



0-0.01 



0.02-0.25 



0.26-0.74 



n 



0.75-0.99 



0.99-1 



0.8 



1.0 



(b) 



(d) 



Fig. 1. (a) Reported sexual contact rates in VAX004- Values larger than 5/day (<0.1%) 
are truncated in the graph but not in the analysis. The vertical line segments indicate the 
location of values between 1 and 5. (b) Reported proportions of condom use in VAX004- 
(c) Reported injection rates in VAX003. (d) Reported proportions of shared needles in 
VAX003. 
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normal cumulative distribution function (CDF) and ^(-\a,(3) as the beta 
CDF. We have 

£it ~ Beta(a,/3), 

m it ~ Binomial (n it ,£it), 

(8) rhit ~ Binomial^, f it ), 

3?M = *(&!«,/?), 

ei~iV(0, 7 J+ (1- 7 )J), 

where £j = (en, . . . , £jTi) r - We use a standard normal copula to model the 
within-subject correlation among = (£n, . . . , £iT;) r , the proportions of con- 
tacts in a subcategory (condom use or needle-sharing). This copula is formed 
by generating a standard normal random vector £j with an exchangeable 
correlation structure, the correlation coefficient being 7, and transforming 
it to a uniform random vector using $ on each component. The uniform 
random vector is then transformed to £j using on each element. The 
£j generated in this way has marginal CDF ^>(-\a,/3) and an exchangeable 
correlation structure. While the correlation coefficient for ^ is not the same 
as that for £j, they share the same rank correlation because the CDFs are 
monotonia Note that the log-normal distribution can be viewed as a special 
case utilizing the standard normal copula. Conditional on nu, ha and 
ma and rhit are independent. 

3.1.3. The prior submodel. We use the following priors for po, and 
hyperpar ameters : 

J 1 

(7 r ^ J 

9 ' 

a 1 



p ~ Uniform(0, 1), 



inr"M-4 



(9) 



1/2 



(a, /3) ~ pnr"(a) lnT"(/3) - lnr"(a + /3)(lnr"(a) + lnr"03))] 1/2 , 
7 ~ Uniform(0, 1), 

k ~Normal(0,4), k = l,...,K, 

Po ~ Uniform(a p , 6 p ), 

where : = 1, . . . , K}, a p and b p are assumed known, and lnr"(-) is the 
trigamma function. Jeffreys' noninformative priors are used for /i, a 2 , ^ and 

(a, (3). 
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Our choice of a relatively wide range (a p ,b p ) is guided by the maximum 
likelihood estimate (MLE) of po obtained solely from the regression sub- 
model. To use this simple likelihood method, we assume nu = nu, and ma 
is estimated by n# x Y^a^it/Ylit^it f° r VAX003. The same assumption 
of a common proportion of shared needles was employed in Hudgens et al. 
(2002). However, one will not be able to differentiate the condom effect 
with a common proportion of condom use, and thus, we assume ma = rhu 
additionally to obtain the MLE of p for VAX004. 

A normal prior A(0,<i|) is reasonable for covariate effects because we let 
the data drive the 95% credible sets away from the null value if strong effects 
exist. The values of {d^ : k = 1, . . . , K} are set relatively large, for example, 
2, to provide a wide domain for the odds ratios. 

3.2. Posterior distributions. Bayesian inferences are based on posterior 
distributions of all unknown parameters and latent variables given the data 
and known parameters, which are derived from the prior and conditional 
distributions stated in the previous section. Let y = (y\, . . . , y T N ) T be the 
vector of observed infection status, where y i = (yn, . . . ,yiTi) T , and let x = 
(x\,...,x T N ) T , where Xi = {x T a , x\ T y and x it = (x in , . . .,x itllit ) T , be the 
observed covariate matrix for all intervals. Similarly, define n, n, m, m, A, 
5, $ and e as the vectors of n it , h iu m it , m it , X it , 5 it , £ it and e it , t = l } ... } T i} 
i = 1,...,N. Let /(•) denote the probability density function (PDF) for 
continuous variables and the probability mass function (PMF) for discrete 
variables. The joint posterior distribution of all unknown parameters and 
latent variables is proportional to the joint full probability of the unknown 
parameters, latent variables and the data: 

f(n, m, 6, A, e, £,Po, 0, (j), n, a 2 ,p, a, (3, j\y, x, n, rh) 

otf(y,n,h,rn,rh,d,\,e,£,po,9,4>,fi,o- 2 ,p,a>,f3,"f\x) 
= f(y\n,m,p ,e,x) x /(m|n,£) x f{n\S) x /(m|n,£) 

(10) 

x f(n\X) x f(S\X,<P) x f(X\fi,a 2 ,p) x /( e | 7 ) 

x f(p) x f(a 2 ) x f(p) x f(d>) x f(a) x /(/?) 

x /(7) x f( Po ) x f(0), 

where ^ exists as a function of s given in (8), and known hyper-parameters 
are suppressed. 

To illustrate the MCMC algorithm used to obtain the joint posterior dis- 
tribution of all parameters, we use VAX004 as an example and give the 
technical details in the appendix. In summary, we use the following strate- 
gies: 
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• n, m, 6, fjL and a 2 are sampled directly from their full conditional distri- 
butions. 

• For A, £ and e, the full conditional distribution is a product of several 
regular density functions, and we use Metropolized independence sampling 
with each density sequentially serving as the proposal distribution. 

• The random-walk style Metropolis-Hastings algorithm is used for sam- 
pling all other parameters. 

4. Application. In the following, we report the posterior medians fol- 
lowed by the 95% credible sets (CS) for parameters in the Bayesian model, 
and make comparisons with point estimates followed by the 95% confidence 
intervals (CI) from the literature when appropriate. 

4.1. VAX004: HIV transmission by sexual contacts. At each semiannual 
follow-up visit in trial VAX004, subjects were asked to classify the sexual 
contacts by the infection status of their partners, that is, positive, nega- 
tive or unknown, based on their knowledge. HIV prevalence among partners 
reported as HIV-negative may be less than that among partners reported 
as HIV-positive. However, an exploratory analysis using a simple likelihood 
method showed that the probability of infection per contact was not differ- 
ent across the three types of partner infection status reported by the study 
participants. Hence, we assume a common prevalence ir of infection among 
all partners and estimate it by 0.06, the proportion of reported contacts with 
positive partners among all contacts in the study population. In addition to 
the analysis for the overall study population, we performed a stratified anal- 
ysis by classifying the study population into three subgroups corresponding 
to low, medium and high baseline (month 0) risk levels. We allow the trans- 
mission probability and vaccine effect to vary across, but assume that other 
parameters are not affected by, risk levels. The baseline risk levels are deter- 
mined by a behavioral risk score ranging from to 7, with as low, 1-3 as 
medium, and 4-7 as high. The behavioral score is derived from nine base- 
line risk factors that are highly predictive of HIV infection [Gurwith et al. 
(2005)]. 

Table 2 gives the results regarding transmission probabilities and VEs 
for VAX004. The vaccine did not show a significant effect, reducing the 
risk of infection per infectious contact by about 7% for the overall study 
population which is not statistically different from 0. Neither did the low- 
risk and medium-risk subgroups show any significant vaccine effect. However, 
we do observe a significant VE of 0.56 (95% CS:0.22, 0.75) in the high-risk 
subgroup, as the associated 95% CS excludes 0. The pattern that higher 
baseline risk tends to be associated with higher vaccine efficacy was also 
identified in Gurwith et al. (2005) via a Cox proportional hazards model for 
grouped times, where they reported an estimate of 0.06 (95% CS:— 0.17, 0.24) 
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Table 2 

VAX004: Summary of the posterior distributions of the transmission probability and the 
vaccine efficacy per infectious sexual contact for the overall study population and by 
baseline risk level, compared to the standard analysis 



p VE (Bayesian) VE (Cox a ) 

Risk 

level Total 6 Infected Median 95% C.S. Median 95% C.S. Estimate 95% C.I. 



Overall 


8772 


368 


0.0056 


0.0044, 0.0071 


0.069 


-0.15, 0.26 


0.06 


-0.17, 0.24 


Low 


3605 


57 


0.0020 


0.0010, 0.0036 


-0.23 


-1.48, 0.35 


-0.48 


-1.93, 0.26 


Middle 


4546 


229 


0.0054 


0.0041, 0.0071 


0.02 


-0.28, 0.25 


0.03 


-0.25, 0.25 


High 


621 


82 


0.020 


0.013, 0.030 


0.56 


0.22, 0.75 


0.43 


0.04, 0.66 



"Results based on Cox proportional hazards model in Gurwith et al. (2005). 
6 Total number of six-month intervals. 



Table 3 

VAX004: Summary of the posterior distributions of other parameters for the overall 

study population 



Posterior 
Quantiles 









a 2 


P 


a 





7 


Median 


1.44 


1.66 


-2.54 


1.95 


0.92 


0.30 


0.29 


0.65 


2.5% 


1.06 


1.61 


-2.58 


1.87 


0.91 


0.29 


0.28 


0.64 


97.5% 


1.94 


1.71 


-2.50 


2.04 


0.92 


0.31 


0.30 


0.67 



for VE per six-month interval for the overall study population and 0.43 (95% 
CS:0.04, 0.66) for the high-risk subgroup, fairly close to our estimates. 

The baseline transmission probability per infectious sexual contact for the 
overall study population is 0.0056 (95% CS:0.0044, 0.0071), suggesting that 
1000 sexual contacts with HIV-positive partners produce about six infections 
on average, without intervention of vaccine or condoms. This probability 
increases across risk levels, with the value for the high risk level 10 times 
that for the low risk level. A possible reason for the increase in transmission 
probability across risk levels is that subjects in higher risk levels might more 
likely under-report the number of contacts. 

Results for all other parameters are presented in Table 3. Surprisingly, 
the reported use of condoms did not seem to be protective with OR con 
estimated as 1.44 (95% CS:1.06, 1.94), suggesting that it increased the odds 
of transmission by about 44%. A possible explanation is that the reporting 
of condom use might be correlated with certain types of sexual behavior. 
A more specific speculation is that subjects in monogamy tended to use 
condoms much less frequently and yet had lower risk of infection as compared 
to those with multiple partners. We included an indicator for monogamy (on 
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average < 2 partners over the study period), but the estimate of OR con did 
not change much (results not shown). 

High within-subject correlation is found among the contact rates and 
proportions of condom use, with p and 7 estimated as 0.92 (95% CS:0.91, 
0.92) and 0.65 (95% CS:0.64, 0.67) respectively. These correlation parame- 
ters indicate the magnitude of, but do not directly measure, the correlation 
coefficients among \ and among £j. Based on posterior medians of p,, a 2 , 
a and (3, we found that the mean contact rate in this cohort is 0.21 (95% 
CS:0.20, 0.22) times per day, and the mean proportion of condom use is 0.51 
(95% CS:0.50, 0.52). 

If a marginal gamma distribution is assumed for Aj , we use the same cop- 
ula technique used for ^ i to introduce within-subject correlation. Changing 
the distribution of the contact rate from log-normal to gamma does not af- 
fect the estimates appreciably except for a slight increase in (j> and decrease 
in p. We compare predicted population-level means and variances of the re- 
ported number of contacts yielded by the two distributions to the observed 
values, shown in Figure 2(a)-(c). While the gamma distribution gives a pre- 
dicted mean closer to the observed mean, the log-normal distribution gives 
a more realistic standard deviation. The heavier tail of the log-normal dis- 
tribution can better catch extreme reported values. We choose not to ignore 
the extreme reported values, and therefore, all above results for VAX004 are 
based on the log-normal distribution for the contact rate. 

While we believe that our prior assumptions over most parameters are 
noninformative or toward-null, we performed a brief sensitivity analysis by 
changing the prior distribution of po. We impose a strong beta prior with 
mean 0.0073 and standard deviation 0.001, instead of Uniform(0. 0001, 0.1), 
on pq. The posterior estimates increase to 0.0063 (0.0052, 0.0075) for po and 
0.12 (-0.08,0.29) for VE, and decrease to 1.28 (0.99,1.68) for OR con , all 
changes being mild. A higher prior mean of po will cause more substantial 
changes in the same directions. 

4.2. VAX003: HIV transmission among IDUs using shared needles. In 
the Bayesian probability structure for trial VAX003, the over-dispersion 
structure and the related parameters, <fi and 5a, are dropped, that is, we 
assume ha ~ Poisson(Ajf). The reason is that there is not sufficient informa- 
tion about over-dispersion with only four categories for the contact rate. We 
stratify the shape and scale parameters by incarceration injection history 
(ui) for both injection rate (Xu) and the proportion of needle sharing (£jt), 
an attempt to control for confounding factors when we evaluate the effect of 
incarceration injection history on the transmission probability. The preva- 
lence of HIV among IDUs in Bangkok was around 30% [Kitayaporn et al. 
(1998)]. It was estimated that the relative prevalence between subtypes E 
and B was growing at a decreasing rate between 1998 and 2000, and reached 
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std: 114 
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Fig. 2. (a) Reported number of sexual contacts in VAX004- Values larger than 1000 
are truncated. The vertical line segments indicate the location of values between 200 and 
1000. (b) Predicted number of sexual contacts in VAX004, assuming gamma distribution 
for contact rate, (c) Predicted number of sexual contacts in VAX004, assuming log-normal 
distribution for contact rate, (d) Reported number of injections in VAX003. (e) Predicted 
number of injections in VAX003, assuming gamma distribution for injection rate, (f) Pre- 
dicted number of injections in VAX003, assuming log-normal distribution for injection 
rate. 



70%:30% in 2000 [Kitayaporn et al. (1998), Hudgens et al. (2002)]. Based 
on this information, the average relative prevalence most likely is between 
0.7:0.3 to 0.8:0.2. We use tt^ = 0.75 x 0.3 = 0.225 and vr^ = 0.075 for anal- 
yses stratified by subtype. 

We performed additional analyses stratified by two baseline behavioral 
risk levels defined in Pitisuttithum et al. (2006). A subject (and all his six- 
month intervals) is classified into the high baseline risk level if 2 or more 
of the following risk factors were present at visit 0: use of injection drugs 
regularly, use of injection drugs daily or weekly, use of injection drugs with 
shared needles, history of incarceration during the past 6 months, partner 
was an IDU, or shared needles with partner. Otherwise, the subject is clas- 
sified into the low risk level. 
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Table 4 

VAX003: Summary of the posterior distributions of the transmission probability and the 
vaccine efficacy per infectious needle- sharing act for the overall study population and by 
baseline risk level and HIV subtype, compared to the standard analysis 



Risk 
level 


Sub- 
type 


Total'' 


Infect- 
ed" 




P 


VE (Bayesian) 


VE (Cox a ) 


Median 


95% C.S. 


Median 95% C.S. 


Estimate 95% C.I. 


Overall 




13797 


206 


0.026 


0.021, 0.031 


-0.08 -0.43,0.20 


0.001 -0.31, 0.24 




E 




160 


0.028 


0.022, 0.034 


-0.12 -0.52,0.17 


-0.014 -0.38, 0.25 




B 




32 


0.019 


0.012, 0.029 


0.18 -0.57,0.60 






E/B 






1.45 


0.91, 2.39 






Low 




6622 


80 


0.033 


0.024, 0.045 


0.06 -0.49,0.41 






E 




55 


0.034 


0.022, 0.048 


0.04 -0.66,0.42 






B 




16 


0.032 


0.015, 0.058 


0.18 -1.33,0.67 






E/B 






1.06 


0.51, 2.54 






High 




7175 


126 


0.023 


0.017, 0.029 


-0.10 -0.60,0.23 






E 




105 


0.025 


0.019, 0.032 


-0.21 -0.77,0.19 






B 




16 


0.015 


0.008, 0.026 


0.34 -0.63,0.77 






E/B 






1.68 


0.92, 3.31 







"Results based on Cox proportional hazards model in Pitisuttithum et al. (2006). 
b Total number of six-month intervals. 

"Intervals for 5 subjects infected by visit (E:4, B:l) are excluded. The 14 untypeable 
infections are not shown. 



The results for transmission probabilities and vaccine efficacies are pre- 
sented in Table 4. None of the VE estimates are significantly different from 
0. We estimate the VE per infectious needle-sharing act as —0.08 (95% 
CS:-0.43,0.20) for overall transmission and as -0.12 (95% CS:-0.52, 0.17) 
for subtype E. Although subtype B tends to have a better VE than subtype 
E, the difference is not significant. Pitisuttithum et al. (2006) reported simi- 
lar VE estimates, 0.001 (95% CL-0.31, 0.24) for the overall IDU cohort and 
—0.014 (95% CI:— 0.38, 0.25) for subtype E, based on a Cox proportional 
hazards model for grouped times. 

The baseline transmission probability per injection using a needle shared 
with an HIV-positive IDU is 0.026 (95% CS:0. 021, 0.031), suggesting that, 
out of 100 such injections, 2.6 on average will transmit the virus. The 
subtype-specific baseline transmission probabilities are estimated as 0.028 
(95% CS:0.022, 0.034) for p^ ] and 0.019 (95% CS:0.012, 0.029) for p£\ 
higher than the 0.016 (95% CL0.012, 0.02) and 0.0063 (95% CL0.0041, 
0.0092) estimated in Hudgens et al. (2002) based on a likelihood method. 
It is interesting that the transmission probability per injection is somewhat 
higher for the low versus high baseline risk, opposite to the direction ob- 
served in VAX004. The ratio of p$ to Pq\ with a posterior median of 1.45 
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Table 5 

VAX003: Summary of the posterior distributions of other parameters for the overall 

study population 



With incarceration Without incarceration 

injection history injection history 
Posterior 



quantiles 




P 


7 




n b 


(X 


P 




/3> b 


a 





Median 


0.47 


0.50 


0.47 


0.24 


1.87 


0.23 


1.36 


0.20 


1.25 


0.23 


5.28 


2.5% 


0.30 


0.48 


0.44 


0.21 


1.60 


0.20 


1.12 


0.19 


1.18 


0.22 


4.85 


97.5% 


0.72 


0.52 


0.51 


0.27 


2.24 


0.26 


1.66 


0.21 


1.32 


0.25 


5.75 



"Shape of the gamma distribution for contact rate. 
b Scale of the gamma distribution for contact rate. 



(95% CS:0.91, 2.39), is only marginally different from 1, lower than the 2.48 
(95% CL1.63, 3.88) reported in Hudgens et al. (2002). 

Table 5 summarizes estimates for all other parameters. The odds ra- 
tio for incarceration injection is estimated as 0.47 (95% CS:0.30, 0.72). 
Hudgens et al. (2002) reported a much higher value, 4.47 (95% CL2.63, 
7.19), where a time- varying prevalence ratio with an average about 0.55:0.45 
between subtypes E and B and a common proportion of 4% for needle shar- 
ing across the whole population were assumed. Among subjects with in- 
carceration injection history, the mean injection rate is 0.45 (95% CS:0.37, 
0.54) times per day and 14% (95% CS:12%, 17%) involved shared needles. 
In contrast, among those without incarceration history, the mean injection 
rate is 0.25 (95% CS:0.24, 0.27) times per day and 4.2% (95% CS:4.0%, 
4.5%) involved shared needles. The assumption of a common proportion of 
needle-sharing in Hudgens et al. (2002) lowers the injection frequency and 
proportion of needle-sharing down to the overall level, and consequently in- 
creases the adjusted transmission probability for subjects with incarceration 
history. In addition, the incarceration injection indicator is defined for each 
interval in Hudgens et al. (2002), whereas we define it for each individual. 
Posterior estimates of p, 0.50 (95% CS:0.48, 0.52), and 7, 0.47 (95% CL0.44, 
0.51), suggest substantial within-subject correlation, though not as high as 
those in VAX004. 

Similar to VAX004, log-normal and gamma distributions for the injection 
rate lead to similar results, with a slight difference in p. In Figure 2(d)-(f), we 
see that the heavy tail of the log-normal distribution yields extremely large 
predicted moments for the reported number of injections and thus makes it 
less competitive than the gamma distribution for modeling injection rates 
reported in a few categories. Consequently, all results presented for VAX003 
are based on the gamma distribution for injection rate. 

We performed sensitivity analyses by changing the relative prevalence 
to 0.7:0.3 and 0.8:0.2. As expected, the transmission probability 
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tends to decrease for subtype E but to increase for subtype B, as the rela- 
tive prevalence of subtype E increases. For each subtype and risk level, the 
VE estimate changes in the direction opposite to that of the corresponding 
transmission probability, but none of the VE estimates differ significantly 
from 0. The magnitude of all these changes are relatively small, especially 
for subtype E. The estimated transmission probability ratio of subtype E to 
subtype B decreases as the relative prevalence of subtype E increases. Par- 
ticularly, subtype E becomes statistically more infectious than subtype B 
with an estimate of 1.88 (95% CS: 1.18, 3.21) for p$ /pq , if the prevalence 
of subtype E is as low as 70% among the IDUs. 

5. Discussion. We established a Bayesian hierarchical model for ana- 
lyzing clinical studies of infectious disease with transmission and exposure 
data observed over discrete time intervals. This model provides assessment 
of the transmission probability and vaccine efficacy conditioning on an in- 
fectious contact, whereas standard methods of analyzing vaccine trials do 
not. Assuming conditional independence between observed and true but un- 
observed quantities, this model provides an approach to adjustment for the 
measurement error in some key risk factors. We used the method to re- 
analyze two HIV-1 vaccine trials on populations who are at high risk of 
HIV-transmission via sexual contacts or sharing needles for drug injection. 
The proposed method could be applied to studies of other vaccines, such as 
human papilloma virus vaccines, where contact information is collected. 

We obtained estimates of vaccine efficacy similar to the primary study 
results, especially for VAX004, confirming the findings of no protective 
efficacy. Two factors may contribute to this similarity in VE estimates. 
First, the measurement error might be relatively small for the majority 
of the study population. Second, our model assumes unbiasness, that is, 
E(n*t|Ait) = E(n it \\it) and E(m it \\it,£it) = E(mi t \\it, £u) ■ However, if the 
bias trend is similar in both treatment groups, even a model with bias cor- 
rection will likely yield a similar VE estimate as well. Despite the similarity, 
our hierarchical model provides joint inference on not only the transmission 
probability and VE but also the population-level behavioral characteristics 
such as the contact rate and proportion of condom use (needle-sharing). 

We have assumed an exchangeable structure for within-subject correla- 
tion among contact (injection) rates and proportions of condom use (needle- 
sharing), using the copula method. A more sophisticated structure may 
be considered given sufficient data. Within-subject sample correlation co- 
efficients among the logarithm of reported contact rates, {log(hit/lu) -t = 
1, ...,Tj}, and among reported proportions of condom use, {rhu j 'flu : t = 
1, . . . ,Tj}, in VAX004 do indicate that correlation wanes away as two inter- 
vals are further apart, but the variation range is relatively small, 0.3-0.5 for 
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the former and 0.45-0.67 for the latter. Therefore, an exchangeable struc- 
ture is a reasonable assumption, albeit an autoregressive structure such as 
the ARMA(p, q) model [Chib and Greenberg (1994)] may be more realistic. 
The range of 0.3-0.5 for {log(hit/lu) : t = 1, . . . , Tj} may seem contradictory 
to the Bayesian estimate of p around 0.9. A plausible explanation is that the 
addition of 5u to reflect the over-dispersion may attenuate the true correla- 
tion among the elements of Aj, as the elements of Si are independent given 
Aj. Consequently, a high correlation among the elements of Aj is needed to 
yield a moderate marginal correlation among the elements of Si. In fact, 
the parameter estimates, especially for transmission probabilities and VEs, 
do not change much if we assume intervals within the same subject are in- 
dependent. A possible reason is that only the overall magnitude of rij and 
rrii matter in the estimation of po and the VE, and the magnitude mainly 
depends on the observed fii and rhi and is much less affected by the correla- 
tion. However, we do see that correlation adjustment changes the shape and 
scale of the distributions of the contact rate Xu and the proportion in a 
more noticeable way. For example, without incarceration injection history, 
the estimates for the shape parameter (5 for the proportion of needle-sharing 
in VAX003 change from 5.28 (95% CS:4.85, 5.75) to 6.3 (95% CS:5.92, 6.69) 
when within-subject independence is assumed. 

To adjust for error in self-reported contact information, we assumed a 
Poisson process for the true number of contacts and an over-dispersed Pois- 
son process for the reported one, and that the two processes are condition- 
ally independent given the underlying contact rate. Ideally, validation data 
would be available so that the measurement error could be modeled paramet- 
rically or without parametric assumptions as in Golm, Halloran and Longini 
(1999). The collection of validation data would be useful in future vaccine tri- 
als. In this Bayesian framework, a more general bivariate distribution could 
be modeled between n& and fin given or between \n and a latent rate 
Xit that determines the distribution of hit, had validation data been avail- 
able on contact frequency. Another form of additional data, replication of 
hit an d fhit in all or some of the intervals, can also improve model precision 
[Carroll, Ruppert and Stefanski (1995)], but the assumption of unbiasness 
of hit for the true nu has to be retained. A possible parametric utilization 
of replication data in our model is to allow for within-interval correlation. 

Other than log-normal and gamma distributions, a more flexible option for 
modeling the contact rate may be mixture prior densities [Richardson et al. 
(2002)]. It is likely that the true number of contacts also comes from an 
over-dispersed Poisson process, but whether such a model is identifiable 
needs further investigation. When the number of contacts is given as K 
categories and K is small, for example, in trial VAX003, the Poisson and 
over-dispersed Poisson structure may not be realistic. In that case, a more 
flexible probability structure is to assume that nn and ha independently 
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follow a discrete distribution indexed by p it = (pui, ■ ■ ■ ,Puk) t , where puk 
is the probability of falling in the kth category for interval (i,t), and p it ~ 
Dirichlet(o;) for some random or known vector a. 

The model is sensitive to the contact-related information when such in- 
formation is limited. For instance, when the value assigned to the "None" 
category of the reported proportion of needle-sharing was increased from 
0.5% to 5% or higher, we were unable to obtain convergence, likely due 
to the lack of curvature supporting the estimation of a beta density. We 
emphasize for future studies that, in terms of contact frequency, numbers 
are more informative than categories, and more categories are preferred to 
fewer. Another factor to which the analyses are sensitive is the prevalence 
of infections among partners. While it is impossible to obtain the infection 
status of all partners, a validation set of partners randomly selected for ver- 
ification of infection would help improve the inference. To alleviate under- 
or over-reporting of contact frequency, it is also important to ensure that 
study participants understand the definition of a contact, especially when 
the study involves multiple contact types. Extremely high frequencies, for ex- 
ample, the numbers of sexual contacts that were reported as over thousands 
per six-month interval by several participants in VAX004, may indicate mis- 
understanding of the definition, and should be verified with the participants 
during the follow-up visits. The underlying mechanism of measurement er- 
ror in contact-related factors in real studies may never be known, and the 
best way to improve the VE estimation is to reduce the error at the data 
collection step. 

APPENDIX: MCMC METHODS AND RELATED SAMPLING ISSUES 

MCMC sampling schemes. We use fdist('\ m ) to denote the PDF for con- 
tinuous variables or the PMF for discrete variables, and F^ti'l') to denote 
the CDF of a random variable given parameters. The subscript "dist" could 
be "Bin" for binomial, "Pois" for Poisson, "Beta" for beta, "G" for gamma, 
"IG" for inverse gamma, "iV" for normal and "LN" for log-normal distribu- 
tions. Whether the distribution is univariate or multivariate is determined 
by the parameter input. 

Sampling nn. Define qn = 1 — p(vi,l) as the probability of escaping 
infection from a contact protected by condom use, and similarly, define g,o 
I — p(vi, 0) for an un-protected contact. The conditional probability of nn is 



20 Y. YANG, P. GILBERT, I. M. LONGINI, JR. AND M. E. HALLORAN 
given by 

f (A it ^(l - Cit)q l0 ) n - mu exp{-X it lit(l ~ &)fto} 



Pr(n it =n\-) = < 



(n - m it )\ 

{\itkt{l - <iu)Y~ mit exp{-X lt l it (l - &)} 



x[i-^*g; 



(n - m it )\ 



mun—mu 



,0 



it i 



2/it = 0, 



= 1, 



where C it = l- qn rnit exp{Xi t kt(l - Sit)(qio - !)}• When y if = 0, we sample 
Tin — ran directly from Poisson(Ait/^(l — Sit)qio)- When yu = 1, note that the 
conditional CDF of nu is 
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As the CDF is a nondecreasing function, we use direct sampling in combina- 
tion with binary searching. For example, to sample nu, we generate a value 
z from Uniform(0, 1); then, the smallest n satisfying Pr(n^ < n\-,yu = 1) > z 
is the sampled value of nu and can be found using binary searching or other 
advanced searching methods. 

Sampling run. The conditional probability of mu is 
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where D it = 1 - [Suqn + (1 - &t)<?io] n " ■ When y it = 0, we sample m it directly 
from Binomial(n^, ^ g . i+ ^^.^ g . o )• When yu = 1, we have 
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where "P = ^■ tg . 1 +(i 7 -g. t ) g . • We use the same technique in sampling nu, that 
is, direct sampling in combination with binary searching. 
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Sampling \ it . Define ^ = ^1^x1 and Si = a 2 {pJ Tl xT x + (1 ~p)It 1 xT 1 ))- 
The likelihood part concerning the contact rate vector Aj is given by 

Li(A»|-) oc fLN(Wv>i, Si) [ JJ fG{\t\nu + 1, Z^ 1 ) J 

(T 
ii// (a 

To sample A,, we take the following steps: 

• First sample A* from Log-Normal(/Xj, Sj), and accept it with the proba- 
bility 

mi J 1 nrii{/G(A? t |n it + l,Z^ 1 )/ JG (A? f |^Z ft /(^))A?J - 
V ' T\IU{fG(\it\nu + l^fiGihtWitKWit))^}. 
Update A, with A* if the new sample is accepted; 

• Sample a new A* from Y\J=i{fG(Kt\ n it + Mit 1 )) and accept it with the 
probability 

\ fLN(X^i,^)UlLl{KtflG(KMJu/mt))} \ 

' hNiWfii, s^ nEi{^//G(A it |0, kt/mt))}) ' 

Update A, with A* if the new sample is accepted; 

Sample a new A* from IlS=i fiG{\t\4>, and accept it with the prob- 
ability 

y fLNiXtl^^mlUiKtfGiXtMu + lJu 1 )}' 



mm 



/i,Jv(Ai|/Xi,Si)n2=i{Ait/G(Ait|nit + 1, 1^)} 

This cross-sampling procedure is a generalization of the Metropolized in- 
dependence sampling algorithm [Chib and Greenberg (1995)]. Liu (1996) 
showed that Metropolized independence sampling is superior to rejection 
sampling with respect to asymptotic efficiency and ease of computation, 
given that the proposal density provides a reasonable coverage over the do- 
main of the posterior density. In this case, we have a composite full likelihood 
L(x) oc f(x)g(x) in which f(x) and g(x) are both ready for sampling. Us- 
ing f{x) and g(x) alternately as the proposal density can better cover the 
reasonable range of x as compared to using either f{x) or g(x) alone as the 
proposal density. 

Sampling ei and £i. Define Tj = 7 J Ti xTi + (1 - j)I Ti xT % ■ The likelihood 
part concerning ei is given by 

(13) Li(eil-) oc/ Y (e i |0,T i ) x JJ ^« + ^(l - 

t=i 
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The above likelihood is expressed in terms of £j, and £j exists through = 
^ / ~ 1 ($(e^)|a,/3). To express the likelihood in terms of £j, (13) becomes 

L i (^ i |-)ocexp{-ieJ'Tr 1 e i + ieJ"e i } 

( 14 ) 

x n^ e4a (^l a + mi * + mji,/3 + n it -m it + h it -rh it ), 
t=i 

where exists via e« = ^(^fek /?)). | n^i{^ -1/ (*fet))*'tet)}| is the 
Jacobian term, and <& _1 '(x) = [/jv(^ _1 (x)|0, l)] -1 . 
The sampling of £j and ^ proceeds as the following: 

• Based on (13), sample s* from Normal(0, Tj), and accept it with the 
probability 



mm 



i TT 



_i ^ m » t+m4t (i - ^ it )™«-m«+™it-mit y ' 

where £* t = ^~ 1 (^(e* t )). Update Si and if the new sample is accepted; 
• Based on (14), sample £* from Jl^Li fBeta(a + m i4 + m#, /3 + (n it - m it ) + 
(n^t — Wit)), an d accept it with the probability 

mi J 1 exp{(-l/2) e rTr 1 e t + (i/2) e ^ e? } x 
V ' exp{(-l/2)s[Tr 1 ei + (l/2)<£ J } ^ 
where 4 = *~ 1 (*(^)). 

Sampling other parameters. Let log A; = (log Aji , . . . , log ) T , /^j = pl^ x l , 

and let Ri = pJ^xTi + (1 - p)lT i xT 1 such that £j = (o 2 )Ri. 

The following parameters are sampled directly from their full conditional 
distributions: 

8it\- ~ Gammaf ra^ + <f>, ( 1 + 



V V \tht 



/E^rs^iogA. 

Ml" ~Normal( ^ 1r , 




er 2 1 • ~ Inverse Gamma I \ ^ Tj , 



2 As 

L i=l 



A random-walk style Metropolis-Hastings algorithm is used to sample p, 
4>, a, (3, 7, po and 6, that is, a new value is sampled from a normal density 
with the current value as its mean. The variance of each proposal normal 
density is dynamically adapted to reach an acceptance rate of 0.3-0.4. To 
apply this sampling scheme, appropriate transformation may be necessary 
so that the domain of the transformed parameter is (—00,00), for example, 
a logit transformation for the transmission probability. 
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Diagnostics for convergence. We run three chains simultaneously and 
use the scale reduction factor to monitor the convergence of the chains. The 
scale reduction factor is defined as 



<M-1 1 B 
M + MW' 



where M is the number of runs, and B and W are the between-sequence and 
within-sequence variances, respectively. Gelman and Rubin (1992) showed 
that the factor \f~R will approach 1 as M — > oo, and recommended that the 
convergence can be considered as reached if V~R < 1.2 for all parameters. We 
calculate V^R for each 5000 iterations afterward and the criteria V~R < 1.2 
is adopted as the stopping rule. 

The results of analyzing the two AIDSVAX trials are based on the last 
5000 iterations of three parallel chains. A burn-in period of 5000 runs is 
enforced after the variances of proposal normal densities are fixed. To reduce 
the correlation within each successive chain, we loop over the last 5000 runs 
of the three parallel chains, and at each loop we randomly pick one chain to 
read in the samples. 
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